Churn prevention allows companies to develop loyalty programs and retention campaigns to keep as many customers as possible.

In this example, we use customer data from a bank to construct a predictive model for the likely churn clients.

The variable to be predicted is binary (churn or loyal). Therefore this is a classification project.

The goal here is to model churn probability, conditioned on the customer features.

Import data

data is from a modelling competition and it contain sensitive data

EDA

Categorical Data

Non-Float Numerical Data

Float Numerical Data

Feature Engineering

Impute Missing Value

Fill Missing Value Numerical Float with median

Fill Missing value of Categorical and Numerical Non-float with Mode

Train-Test Split

PCA

One hot Encoding

Oversampling

Feature Selection

VIF

Pearson Correlation

Chi-Square

Model and Evaluation

With PCA, Oversampling and One-Hot Encoding

XGBoost

Random Forest

Logistic Regression

LightGBM

Tuning Hyperparameter (GridSearchCV)

With VIF and Oversampling

XGBoost

Random Forest

Logistic Regression

ANN

with Chi-squared Selection and PCA